A Unified Approach to Algorithms with a Suboptimality Test in Discounted Semi-markov Decision Processes
نویسنده
چکیده
This paper deals with computational algorithms for obtaining the optimal stationary policy and the minimum cost of a discounted semi-Markov decision process. Van Nunen [23) has proposed a modified policy iteration algorithm with a suboptimality test of MacQueen type, where the modified policy iteration algorithm is policy iteration method with the policy evaluation routine by a finite number of iterations of successive approximations and includes the method of successive approximations and policy iteration method as special cases. This paper devises a modified policy iteration algorithm with the sUboptimality test of Hastings and Mello type and proves that it constructs a finite sequence of policies whose last eleme:nt is either a unique optimal policy or an €-optimal policy. Moreover, a new notion of equivalent decision processes is introduced, and many iterative methods for solving a system of linear equations such as the J acobi method, simultaneous overrelaxation method, Gauss-Seidel method, successive overrelaxation method, stationary Richardson's method and so on are shown to convert the original semi-Markov decision process to equivalent decision processes. Various transformed algorithms are derived from the modified policy iteration algorithm with the sUboptimality test applied to those equivalent decision processes. Numerical comparisons are made for Howard's automobile replacement problem. They show that the modified policy iteration algorithm with the suboptimality test is much more efficient than van Nunen's algorithm and is superior to the policy iteration method, linear programming and some transformed algorithms.
منابع مشابه
Accelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملSemi-Markov decision problems and performance sensitivity analysis
Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop PA theory for semi-Markov processes (S...
متن کاملContinuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach
This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time Markov decision processes (MDPs). The reduction is based on the equivalence of strategies that change actions between jumps and the randomized stra...
متن کاملRisk-Sensitive Markov Control Processes
We introduce a unified framework to incorporate risk in Markov decision processes (MDPs), via prospect maps, which generalize the idea of coherent/convex risk measures in mathematical finance. Most of the existing risk-sensitive approaches in various literature concerning with decision-making problems are contained in the framework as special instances. Within the framework, we solve the optima...
متن کاملModel-Building Adaptive Critics for Semi-Markov Control
Adaptive (or actor) critics are a class of reinforcement learning algorithms. Generally, in adaptive critics, one starts with randomized policies and gradually updates the probability of selecting actions until a deterministic policy is obtained. Classically, these algorithms have been studied for Markov decision processes under model-free updates. Algorithms that build the model are often more...
متن کامل